When building a bike network, it is important to understand where and why to build stations
Stations have a cost (Maintenance, Land Value, etc.)
Asking the right questions of our data can aid in building a better network
BikeShare Riders in Oakland
Introduction of the dataset
Data comes from a public repository of the San Francisco Bay Area. It includes station, weather, and individual trip information
The data spans from August 2013 - August 2015
Includes 669,959 bike trips across 5 cities (San Francisco, Redwood City, Mountain View, Palo Alto, and San Jose)
To reduce our data size we focused on trips during 2015 only
Graph Model
Graph Model
Trips have a relationship for starting and ending at a station
Weather conditions were binned based on distribution of average weather conditions for the 2014 year
Two types of subscription type exist (Customers and Subscribers)
Bike Network Map
3 main clusters of stations
50 miles from San Francisco to San Jose
All stations are connected in the network
City to City Projection
Each city cluster is essentially a disconnected component due to the cost of traversing the distance between cities
Only 2 trips are taken from San Francisco to other cities during the period
We reduced our data size by focusing on San Francisco going forward
Station Usage
San Francisco Caltrain stations are at the same location and together contribute the most usage
These stations are at a major train station for the area
Trips Over Time
Low points for subscribers are the weekends
High points for customers are the weekends
Red dots are holidays
Theory: Subscribers are mainly working professionals using the system to get to work and customers are prominently tourists
Louvain Clustering
The yellow cluster seems focused around the financial district (Includes San Francisco Caltrain - Townsend at 4th)
The blue cluster primarily follows the rail line (Includes San Francisco Caltrain 2 - 330 Townsend)
The orange cluster is focused on the water front
Page Rank
PageRank by Membership Type
Top stations in each subset follow patterns similar to our clustering analysis
Top Page Rank scores for customer’s follow those stations year the water front
Subscriber top scores corresponded with stations in the financial district and the light rail line
Conclusion
We need more data in order to conclusively prove these patterns are due to the workforce and tourism, but at a glance it seems to be the case
Building a bike share network is a complex task and understanding ridership is important for ensuring it serves the goals of our organization
Future Work:
Bring in additional data (San Francisco open data on businesses)
Build an application for riders and use a predictive model to suggest locations in the city and advise what the closest station is
Ask more questions of our data (Is there a minimum distance that we should separate stations?, Are stations with connected designated bike lanes more popular?, What stores/destinations/public transit/etc. are near popular stations?)